A boxplot includes the 0th, 25th, 50th, 75th, and 100th percentil of a set of numbers.
If we had a dataset of 10 numbers:
dataset <- c(1,1,3,4,5,5,6,7,8,10)
basic R functions for boxplots:
quantile(dataset)
## 0% 25% 50% 75% 100%
## 1.00 3.25 5.00 6.75 10.00
boxplot(dataset)
A tutorial on introductory boxplots using R can be found here: https://www.statmethods.net/graphs/boxplot.html
data <- read.csv("EMSL_PNNL.csv", header=TRUE)
data_ga <- data[36,]
ggplot is an effective package in R that allows for plotting data in a clean and pretty manner. This includes boxplots, violin plots, line graphs, histograms, and more.
ggplot plots two variables in a dataset as x and y of a grid. Thus, the x and y must be columns in a dataset.
In our current dataset “data_ga”, we want to plot 4 boxplots with approximately 5-10 values in each boxplot.
So, the x column must include character values of “Mineral Control”, “Mineral Heated”, “Organic Control”, and “Organic Heated”. The y column must include the frquencies.
the syntax of ggplot is first the data then its aesthetics (aes) which are essentially the variables of the graph. Then additional layers can be added “+”.
Below, the fundamental ggplot is: ggplot(data, aes(x=type, y=frequencies, fill=type)))
The ‘fill’ is the color of each boxplot. The color will be grouped up by the “type” of value (aka the x variable).
Then, the added layers is “geom_boxplot()” which makes the output a boxplot and “geom_point()” which outputs points on the already added boxplots.
ga <- ggplot(data_ga_wrangled, aes(type, frequencies, fill=type))
ga +
geom_boxplot() +
geom_point()
Next, let’s add color to the boxplots and label (labs):
To fill the boxplots with specific colors manually, hex codes can be used (and can be found online). Note that we’ve already filled the boxplots using the ggplot aesthetic “fill” by type (above) and we are now overlaying the default fills with manual colors.
boxga <- ga + geom_boxplot() + geom_point() +
scale_fill_manual(values=c("#808080", "#B43757", "#808080", "#830300")) +
labs(title="glyceric acid Frequencies")
boxga
Additional manipulations can be done to the ggplot including changing the axis labels, font sizes, colors, legend placement, etc.
Here is the ggplot cheatsheat:
https://rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf
A great explanation on why transformations are necessary in Biological statistics:
http://www.biostathandbook.com/transformation.html
Common transformations:
referenced from:
(https://fmwww.bc.edu/repec/bocode/t/transint.html) and Professor Roy Thompson FRSE of The University of Edinburgh.
Visual for Transformations
from (https://www.statisticssolutions.com/transforming-data-for-normality/)
boxga + scale_y_continuous(trans = 'log')
The y-axis ticks are innacruate to the boxes. To change that, the breaks and labels must be corrected using trans_breaks and trans_format. These functions can be found in the package “scales”.
library(scales)
##
## Attaching package: 'scales'
## The following object is masked from 'package:purrr':
##
## discard
## The following object is masked from 'package:readr':
##
## col_factor
boxga + scale_y_log10(breaks = trans_breaks("log10", function(x) 10^x),
labels = trans_format("log10", math_format(.x)))
more information and a tutorial: https://r4ds.had.co.nz/transform.html
Similar to boxplots but includes a ‘rotated kernel density’ plot on each side rather than a box.
Plotting the same data as a violin plot:
violin_ga <- ga + geom_violin()
violin_ga
You can display the median and quartiles in the form of a boxplot
median_v_ga <- violin_ga + geom_boxplot(width=0.1, col="red")
median_v_ga
Or the mean and standard deviation
mean_v_ga <- violin_ga + stat_summary(fun.data=mean_sdl, geom="pointrange", color="red")
mean_v_ga
With such small datasets, the standard deviation lines are unsurprising. Personally, of these two options (displaying mean or median), I prefer the violin plot with the median.
More info is here: http://www.sthda.com/english/wiki/ggplot2-violin-plot-quick-start-guide-r-software-and-data-visualization